Successful attack on permutation-parity-machine-based neural cryptography 
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An algorithm is presented which implements a probabilistic attack on the key-exchange protocol 
based on permutation parity machines. Instead of imitating the synchronization of the communicat- 
ing partners, the strategy consists of a Monte Carlo method to sample the space of possible weights 
during inner rounds and an analytic approach to convey the extracted information from one outer 
round to the next one. The results show that the protocol under attack fails to synchronize faster 
than an eavesdropper using this algorithm. 
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Interacting feed-forward neural networks can synchro- 
nize by mutual learning [HQ. If two networks A and 
B are trained with examples consisting of random in- 
puts and the corresponding output of the other one, their 
weight vectors converge. In the case of tree parity ma- 
chines (TPMs) this mutual synchronization of A and B 
requires fewer examples than training a third network E 
successfully 0-0] ■ Based on this effect a TPM-based neu- 
ral key-exchange protocol has been developed [8^14] and 
shown to be useful in embedded devices [12|, Il3| as well 
as being sufficiently secure against several attacks @- 

Recently, a variant of neural cryptography has been 
presented in Ref. 1 14] which uses permutation parity 
machines (PPMs) jl5| | instead of TPMs. This change 
increases the robustness of the key-exchange protocol 
against the attacks which have been tried on the TPM- 
based algorithm before (l6l - [T8j . However, it also reduces 
the number of possible values per weight from 2L + 1 > 3 
to 2, so that other attacks become more feasible. This is 
especially true for the probabilistic attack, which has been 
suggested by Ref. [l9j, but not implemented up to now. 
We have used this idea and developed an attack method 
especially suited for PPM-based neural cryptography. In 
this Rapid Communication, we describe our attack and 
present results indicating its success. 

A PPM is a neural network consisting of two layers: 
There arc K hidden units in the first layer, each of which 
has an independent receptive field of size N, and only 
one neuron in the second layer. Its KN inputs Xij 
with indices i = 1, . . . , N and j = 1, . . . , K are binary: 
Xij € {0, 1}. In order to simplify the notation they are 
combined into input vectors Xj = (xij, . . . , %N,j) or t ne 
input matrix X = (xi, . . . , x^-) where appropriate. 

The weights Wij are selected elements from the state 
vector s of the PPM, which consists of G ^ KN elements 
Si G {0, 1}. For that purpose a matrix it of size N x K 
containing numbers G {!,..., G} is used, so that 
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denotes the scalar local field and 
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is the Heaviside step function. Finally, the total output 
of the PPM is calculated as the binary state r G {0, 1} 
of the single unit in the second layer which is set to the 
parity 
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of the hidden states <7j . 

When implementing the synchronization task, two 
PPMs (A and B) designed with the same settings (i.e., 
with same TV, K, and G) are provided. The synchroniza- 
tion will succeed after several inner and outer rounds, 
which are described below. 

For each inner round, the elements of the matrix 7r 
and the input vectors Xj are drawn randomly and in- 
dependently from their corresponding value set. These 
quantities are provided publicly to all PPMs, which in- 
cludes even an attacker E. Then both A and B compute 



then determines the mapping from the input vector Xj their outputs r and r and if they agree (r 



to the state Uj G {0, 1} of the j-th hidden unit. First, 
the vector local field h, is calculated as the one-by-one 



they store the state and af of their first hidden units 
in a buffer, which remains private for each PPM. 
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Thus, there are as follows: public and common input 
vectors x 3 and the tt matrix; public, but not necessar- 
ily equal outcomes t a and r B ; private, not necessarily 
equal state vectors s A and s B ; and private, not necessar- 
ily equal states of the hidden units cr A and aj . 

The inner rounds are repeated until the buffers where 
cr A and erf are stored reach size G. Then, an outer round 
is completed and each buffer becomes the new state vec- 
tor in the corresponding machine, substituting the old 
one. The dynamics of the PPMs are such that after each 
outer round the state vectors s A , s B tend to be more 
alike, eventually reaching full synchronization s A = s B . 
The synchronization time t s measured in the number of 
outer rounds is a random variable, as it depends on ran- 
domly chosen initial conditions and inputs. However, its 
mean value rises in a polynomial fashion with increasing 
size N of the input vector as well as growing size G of 
the state vector s [la ]. 

Reported previous attacks on PPMs tried to mimic 
the behavior of the synchronizing networks by using a 
single machine or an ensemble [141 ] . They showed poor 
performance in guessing s A correctly. Namely, for the 
attacks on PPMs with K = 2 and G = 128 analyzed in 
Ref. [HI, the probability of success did not exceed 10 ~ 5 . 

In the following, we present the description of a dif- 
ferent attack strategy. It does not pursue to mimic the 
synchronizing process, but to first guess the state vector 
of A (or B) during an outer round and consecutively to 
reproduce A's (or B's) behavior during the given round 
so that a fair guessing of the bits stored in the buffer and 
the subsequent s A for the next outer round can be done. 

Some notation is introduced. The synchronizing par- 
ties A and B are eavesdropped by a third agent E, which 
implements its own PPM with state vector s E and out- 
put t e . Additionally, the attacker uses a probabilistic 
state vector p E = (pi,...,pa) T to describe its knowl- 
edge about A's state vector s A . Each element pi is an 
approximation of the marg inal probability P(sf = 0|£>) 
that the z-th bit of s A is given all data D observed by 
E before, i.e., inputs and outputs of A and B, which have 
already been transmitted over the public channel. 

At the beginning of the probabilistic attack previous 
information about s A is not available. Therefore, E starts 
with a neutral hypothesis and all pi are initialized with 
the prior probability P(si = 0) = 1/2. 

In each inner round an input X and a matrix tt 
are provided to all PPMs. Then A and B calculate 
their outputs and communicate them publicly. This en- 
ables E to update p E based on the observed data X, 
tt, and t a . For that purpose the posterior probability 
P(si = 0|p E , A, tt, t a ) is estimated using a Monte Carlo 
approach, which is similar to approximate Bayesian com- 
putation [20|. 

This works by generating M state vectors s E which are 
compatible with the current observation as well as prior 
knowledge obtained in previous rounds. The G elements 
of a candidate s E are sampled independently from the 
Bernoulli distribution with probabilities P(si = 0) = p- t 



and P(si — 1) = 1 — Pi. Of course, it is only necessary 
to draw bits Si which are selected as weights by tt. All 
others can be omitted without affecting the result. This 
shortcut speeds the sampling up considerably if G ^ 
KN. Then s E is plugged into E's PPM together with X 
and tt in order to calculate t e . If E's output matches A's, 
t e = r A , the candidate is stored; if not, it is dismissed. 
This procedure goes on until M valid state vectors s E 
have been produced. 

Afterward, the desired marginal posterior probability 
P(si — 0|p E , X, tt, r A ) can be estimated as the relative 
frequency of Si = in the sample. The result is then 
used to update all pt which have been selected as weights 
in the current round. The other elements of p E remain 
unchanged, because the attacker gained no information 
about the corresponding parts of s A . Of course, this 
computation is repeated for the next inner round. 

As the space of all weight matrices W = (wi, . . . , wk) 
is of size 2 NK , approximately 2 NK ~ 1 of them arc com- 
patible with a given X , tt, and t a . Thus if the sampling 
algorithm generates M > 2 NK ~ 1 state vectors, it would 
be similar to a brute force attack. But choosing such a 
large parameter M is only feasible for a very small num- 
ber of weights. 

Updating p E as described above has the effect that its 
elements pi converge toward or 1 after several rounds, 
so that finally M equal state vectors with 

Pi = => Si = l, (6) 
Pi = l => Si=0, (7) 

are sampled. However, defining 

E * _ / for Pi > 1/2, , . 

S * \ 1 for Pi < 1/2, [6) 

as the most probable state provided p E , the attack is 
considered a success as soon as s E * = s A without regard 
to whether or not all the pt have collapsed to or 1. 

In contrast, if one or more pi have collapsed to the 
wrong value, E might be unable to achieve the desired 
output t e = r A in a later round. Such a failure clearly 
indicates that the estimation of some pi has gone wrong. 
In order to avoid an infinite loop in this case, only a finite 
number of attempts is made to generate M valid samples 
s E . If the limit is reached, the element pi of p E which 
is closest to collapse is reset to the neutral hypothesis, 
Pi = 1/2. 

Usually, the algorithm will not be able to guess s A cor- 
rectly in less than one outer round, therefore we need a 
mechanism to transfer the information gained during an 
outer round into the next one. Let p E_ be the proba- 
bilistic state vector after applying the previous algorithm 
on all the inner rounds of a whole outer round. In or- 
der to transfer the information the attacker calculates 
the probability distribution for the state af of the first 
hidden unit conditioned on the probabilistic state vector 
p E as well as the input X and the matrix tt for each 
of the inner rounds with r A = r B . The result is then 
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FIG. 1. (Color online) Synchronization time t s and break time 
tb measured in outer rounds as a function of N for PPMs with 
K — 2, G — 128. Symbols denote mean values and error bars 
denote standard deviations obtained in 100 runs (even N). 
Lines show the results of a fit using linear regression as given 
in Table [I] For odd N around 25 out of 100 runs reaching 
t = 30 had to be aborted and discarded from the data set. 



FIG. 2. (Color online) Success probability P a of the attack 
as a function of N for PPMs with K — 2. Symbols denote 
the percentage of successful attacks found in 100 simulations 
(even N). For odd N around 25 out of 100 runs had to be 
stopped after 30 outer rounds without a clear result, i.e., t s > 
30 and th > 30. These simulations were not considered for 
calculating the probability of success P a . 



used to construct the probabilistic state vector p + for 
the start of the next outer round. 

In the following we describe an algorithm to approxi- 
mate the probability that a single hidden unit has inter- 
nal state &j — given p E and the corresponding public 
information of an inner round. The output (jj depends 
only on the number of Is in the vector local field h 3 , 
which is equal to the scalar local field hj . Here we approx- 
imate the probability distribution P(hj = n\p E ~,X, tt) 
of this quantity by a binomial distribution which uses the 
average probability of finding a 1 in hj as a parameter 

1 N 

q i = nYI [ x *jP*i,j ZijXl -P* itj )] ■ ( 9 ) 

i=l 

Then, the probability 

N/2 

P{jjj = 0|p E - , X, tt) = £ P{hi = n|p E - , X, tt) (10) 
of <Tj = is given by 

N/2 . . 

Pfo = p E , X, tt) = ]T ( n ) " Qi) N ~ n - (n) 

n=0 \ ' 

Finally, the attacker stores this result in p E+ when- 
ever r A = t b occurred in the inner round. This pro- 
cedure succeeded in conveying enough information from 
one outer round to the next one. 

An alternative approach to this task seems to be Monte 
Carlo sampling of erf conditioned on the final p E_ . But 
in our simulations this method proved to be prone to 
failure: Either p E was effectively reset or the algorithm 
could not generate enough valid weight candidates at 



N Time Slope a Offset b 

Even t s 0.495 ±0.028 2.01 ±0.28 

Even t b 0.1275 ±0.0038 2.180 ±0.038 

Odd t s 0.245 ±0.076 9.02 ±0.95 

Odd t h 0.157 ±0.028 4.33 ±0.35 



TABLE I. Linear regression with model t = aN + b for average 
synchronization time t$ and break time %. 



some inner round. Thus we developed and used the an- 
alytic approach instead of calculating p E+ by sampling. 

The attack described in this Rapid Communication is 
often capable of guessing the state vector s A in a number 
of outer rounds that are less than the number of rounds 
that A and B needed to synchronize. This result was 
reproduced for many different setups of the synchronizing 
PPMs: varying input vector size and varying state vector 
size. The usual setup for cryptographic is to use even JV, 
since PPMs with odd N synchronize notably slower or 
sometimes not at all [lj|. However, the algorithm was 
also tried for odd N with an illustrative purpose and 
yielded satisfactory results. 

As for the technicalities a sampling size of M = 10 3 
was chosen. This implies that for N — 2, 4 the algorithm 
works similar to a brute force attack, but for large N 
only a small part of the weight space is sampled, e.g., 
for N — 8 only around 3% of all possible weight configu- 
rations. The absent of performance drop notwithstand- 
ing the scarce sampling highlights the efficiency of the 
algorithm. The mechanism to prevent the attack from 
getting stuck was implemented by resetting one of the 
bits each time that M 2 = 10 6 consecutive unsuccess- 
ful attempts at generating a valid weight candidate were 
reached. Finally, the attack was considered a success 
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as soon as s E * = s A has been reached. The number of 
outer rounds needed to achieve this is called break time 
tb, which varies randomly depending on the initial con- 
ditions and the course of the key exchange. 

Figure [I] shows that the mean values of synchroniza- 
tion time (t s ) as well as break time (tb) grow linearly 
with increasing size N of the input vectors. For all cases 
presented here we find that the attacker is faster than the 
two partners on average, (tb) < (t s ). Additionally, lin- 
ear regression results shown in Table U indicate that (tb) 
grows slower than (t s ), so that increasing N does not 
improve the security of the PPM-based key exchange. 

Synchronization with odd N is much slower than for 
even N. Only runs with tb < 30 or t s < 30 were consid- 
ered here to reduce computational costs. This condition 
also excludes failed synchronization attempts cause d by 
reaching a stable antiparallel weight configuration |15j . 
which can only happen if N is odd. 

In Fig.[2]the performance of the algorithm is presented 
in terms of the probability of success of the attack. The 
functionality for many more different setups is exam- 
ined here. Regarding cases with even N, the perfor- 
mance of the algorithm generally increases as N or G 
become larger. For nearly all configurations shown here 
the success probability P s is above 80% and it actually 
reaches 100% in many situations. Odd N is consider- 
ably more difficult for the attacker, but nevertheless the 
success probability P s remains larger than 50%. These 
values, however, have been obtained for single runs of 
our algorithm on each data set. As the method is non- 
deterministic due to Monte Carlo sampling in each inner 



round, repeating it on the same observations should lead 
to even more success. 

Consequently, the results clearly show that the PPM- 
based neural key-exchange protocol using the parameter 
values K, N ', and G analyzed in this Rapid Communica- 
tion is not secure enough for any cryptographic applica- 
tion. Furthermore, there is no indication that increasing 
the sizes of input or state vectors would reduce the suc- 
cess probability and lead to a secure configuration. 

In contrast, the complexity of successful attacks on 
TPM-based neural cryptography increases exponentially 
with the number 2L + 1 of possible weight values, but the 
effort of the partners grows only proportional to L 2 0]. 
Here L has the same effect as the key size in encryption 
algorithms, which allows to balance speed and security. 
While the probabilistic attack has not been tested on 
TPM-based neural cryptography, it is quite likely that 
the same scaling law for L applies to its success prob- 
ability. But in order to answer this open question we 
are going to implement and analyze such probabilistic 
attacks also for TPMs. 

The same question could be asked regarding the se- 
curity of chaos cryptography [2H - I23I, w hich is based on 
a similar synchronization principle [24|. Consequently, 
probabilistic attacks should be envisioned and tested 
there, too. Nevertheless, the specificity of the present 
implementation suggests that further development is 
needed for attacks on chaotic cryptography. 

L.F.S. acknowledges the financial support of Fun- 
dacion Pedro Barrie de la Maza and funding Grant 
No. 01GQ1001B. 
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